User Behavior research & A/A/B Test for online food product store

I work at a startup that sells food products.
I need to investigate user behavior for the company's app.
Based on the tests we will decide whether the proposed change by the developers yields positive results
regarding the conversion ratio at each event in the process.

How?

User Behavior-

  1. First I'll study the sales funnel, find out how users reach the purchase stage.
  2. Find how many users actually make it to the purchase stage?
  3. How many get stuck at previous stages? and Which stages in particular?

Tests-

  1. I'll look at the results of an A/A/B test and see if the change of the fonts for the entire app change the purchase rate.
  2. I'll create two A groups has certain advantages just to be confident in the accuracy of our testing when the two control groups are similar.
  3. After, I'll Compar the control groups in order to tell us how much time and data I'll need when running further tests.

Open the data file and read the general information

In [2]:
import pandas as pd
import numpy as np
import datetime
from datetime import datetime

from scipy import stats
import math

import warnings

import plotly.express as px
import plotly.graph_objects as go

df = pd.read_csv('/datasets/logs_exp_us.csv',sep='\t')
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244126 entries, 0 to 244125
Data columns (total 4 columns):
EventName         244126 non-null object
DeviceIDHash      244126 non-null int64
EventTimestamp    244126 non-null int64
ExpId             244126 non-null int64
dtypes: int64(3), object(1)
memory usage: 7.5+ MB
In [4]:
df.head(3)
Out[4]:
EventName DeviceIDHash EventTimestamp ExpId
0 MainScreenAppear 4575588528974610257 1564029816 246
1 MainScreenAppear 7416695313311560658 1564053102 246
2 PaymentScreenSuccessful 3518123091307005509 1564054127 248

Checking zeros values-

In [5]:
def zeros(data):
    for i in data.columns:
        if len(data[data[i]==0]) == 0:
            print(i , len(data[data[i]==0]))
        else:
            print(i,len(data[data[i]==0]),(round(len(data[data[i]==0])/len(data[i]),3)))
In [6]:
zeros(df)
EventName 0
DeviceIDHash 0
EventTimestamp 0
ExpId 0
In [7]:
df.describe(include = 'all')
Out[7]:
EventName DeviceIDHash EventTimestamp ExpId
count 244126 2.441260e+05 2.441260e+05 244126.000000
unique 5 NaN NaN NaN
top MainScreenAppear NaN NaN NaN
freq 119205 NaN NaN NaN
mean NaN 4.627568e+18 1.564914e+09 247.022296
std NaN 2.642425e+18 1.771343e+05 0.824434
min NaN 6.888747e+15 1.564030e+09 246.000000
25% NaN 2.372212e+18 1.564757e+09 246.000000
50% NaN 4.623192e+18 1.564919e+09 247.000000
75% NaN 6.932517e+18 1.565075e+09 248.000000
max NaN 9.222603e+18 1.565213e+09 248.000000
In [8]:
print('There is',df.EventName.nunique(),'unique events:\n',df.EventName.unique())
There is 5 unique events:
 ['MainScreenAppear' 'PaymentScreenSuccessful' 'CartScreenAppear'
 'OffersScreenAppear' 'Tutorial']

Step 1. Summary-

  1. I will need to change the column names to more convenient names.
  2. There are no null values or 0.
  3. As expected, the name of the event that repeats the most is "MainScreenAppear" and it constitutes almost 50% of all events.
  4. The time displayed is in Unix time. Required to add a column with readable date and time.
  5. There are 5 types of events.
  6. I will preform duplicated values test in the next part.

Go Up.

Step 2. Prepare the data for analysis

Rename the columns names-

In [9]:
df.columns = ['event_name','user_id','timestamp','experiment_id']
df.head(2)
Out[9]:
event_name user_id timestamp experiment_id
0 MainScreenAppear 4575588528974610257 1564029816 246
1 MainScreenAppear 7416695313311560658 1564053102 246

Add a date and time column and a separate column for dates and time-

In [10]:
df['timestamp'] = df['timestamp'].apply(lambda x:datetime.fromtimestamp(x))
df['date'] = pd.to_datetime(df['timestamp'],format = '%Y:%M:%D').dt.date
df['time'] = pd.to_datetime(df['timestamp']).dt.time
df.head(2)
Out[10]:
event_name user_id timestamp experiment_id date time
0 MainScreenAppear 4575588528974610257 2019-07-25 04:43:36 246 2019-07-25 04:43:36
1 MainScreenAppear 7416695313311560658 2019-07-25 11:11:42 246 2019-07-25 11:11:42
In [11]:
df.timestamp.describe()
Out[11]:
count                  244126
unique                 176654
top       2019-08-04 16:23:19
freq                        9
first     2019-07-25 04:43:36
last      2019-08-07 21:15:17
Name: timestamp, dtype: object

The first date documented is: 2019-07-25 and the last date described is: 2019-08-07.
In total there are 13 days in the logs.

Duplicates check-

In [12]:
df.duplicated().sum()
Out[12]:
413
In [13]:
for i in df[df.duplicated()].columns:
    print(i,':',df[df.duplicated()][i].nunique())
event_name : 5
user_id : 237
timestamp : 352
experiment_id : 3
date : 9
time : 352
In [14]:
df.drop_duplicates(inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 243713 entries, 0 to 244125
Data columns (total 6 columns):
event_name       243713 non-null object
user_id          243713 non-null int64
timestamp        243713 non-null datetime64[ns]
experiment_id    243713 non-null int64
date             243713 non-null object
time             243713 non-null object
dtypes: datetime64[ns](1), int64(2), object(3)
memory usage: 13.0+ MB

Step 2. Summery-

  1. There are 413 duplicates rows, this can be problematic in A / B testing.
  2. We performed a duplication check in each column to see if the duplicates are grouped into a specific column, or a specific date, and we found that the duplications happen in all columns and in a bad long term for us we had to get rid of those lines.
  3. It is of course necessary to check if there are duplicate users who are in more than one group. We will check this before performing the A / A test.

Go Up.

Step 3. Study and check the data

In this section we will get to know the data in more depth.
We will calculate some statistics, we will filter the relevant time period for the test,
and we will examine whether and how much filtering it affected the data and the experimental groups.

How many events are in the logs?

In [15]:
print('There are {} unique events in the logs , and {:,} events overall.'.format(df.event_name.nunique(),df.shape[0]))
There are 5 unique events in the logs , and 243,713 events overall.

How many users are in the logs?

In [16]:
print('There are {:,} unique users in the logs.'.format(df.user_id.nunique()))
There are 7,551 unique users in the logs.

What's the average number of events per user?

In [17]:
print('The average number of events per user is: {:.2f}.'.format(df.groupby('user_id')['event_name'].count().mean()))
The average number of events per user is: 32.28.

Date Time statistics & Histogram

We will examine whether all information in the logs can be processed in terms of the frequency of events relative to the date-

In [18]:
date_max = df.date.max()
date_min = df.date.min()
delta = date_max-date_min

print('The earliest date that appears in Logs is: {}, and the latest date is: {}.'.format(date_min,date_max))
print('The period of time does the data cover is {} days.'.format(delta.days))
The earliest date that appears in Logs is: 2019-07-25, and the latest date is: 2019-08-07.
The period of time does the data cover is 13 days.
In [19]:
fig = px.histogram(df, x="timestamp", opacity  = 0.7)
fig.update_layout(
    title_text='Events by Date & Time Histogram', 
    xaxis_title_text='Events',
    yaxis_title_text='Frequancy')
fig.update_xaxes(
    dtick="d1",
    tickformat="%d %B\n%Y")


fig.update_traces(marker_color='cadetblue')
fig.show()

From the histogram it can be seen quite clearly that the moment when the data begins to be complete is the first of August.
In order for the overall picture not to be skewed, it is necessary to ignore the data recorded before this date.
The data actually represent the dates 2019-08-01 to 2019-08-07.

Go Up.

Period investigation (Users,Events,Proportion)-

In the previous section we found that the period that can be processed starts from the first of August until the end of the logs.
We will now filter the data and examine how much information we lost in terms of - users and events.

In [20]:
n_df =df.query('timestamp > "2019-07-31 23:59:59"')
n_df.head(2)
Out[20]:
event_name user_id timestamp experiment_id date time
2828 Tutorial 3737462046622621720 2019-08-01 00:07:28 246 2019-08-01 00:07:28
2829 MainScreenAppear 3737462046622621720 2019-08-01 00:08:00 246 2019-08-01 00:08:00

Users-

Now, we check how many users we lose when filtering the older data:

In [21]:
filt_user_id = df.query('timestamp <= "2019-08-01 00:00:00"')['user_id']
unq_filt_user_id = filt_user_id.nunique()-n_df.loc[n_df['user_id'].isin(filt_user_id)].user_id.nunique()

print('We lost {} unique users, there share from all unique users is: {:.4%}.'.format(
                                                                    unq_filt_user_id,unq_filt_user_id/df.user_id.nunique()))
We lost 17 unique users, there share from all unique users is: 0.2251%.

Events-

Now, We'll check how much events we lost:

In [22]:
filt_event_id = df.query('timestamp <= "2019-08-01 00:00:00"')['timestamp'].nunique()

print('We lost {} unique events, there share from total events is: {:.4%}'.format(filt_event_id, filt_event_id/df.timestamp.nunique()))
We lost 2610 unique events, there share from total events is: 1.4775%

Now, We'll check how much events we lost in terms of event type:

In [23]:
#share of event types we lose:
event_type = df.event_name.value_counts().reset_index()
filt_event_type = df.query('timestamp <= "2019-08-01 00:00:00"').event_name.value_counts().reset_index()
filt_event_type['share'] = round(filt_event_type.event_name/event_type.event_name,4)
filt_event_type.columns = ['event_name','number','share']
print('The numbers of events by type and there share that we lost is: \n\n{}'.format(filt_event_type)) 
The numbers of events by type and there share that we lost is: 

                event_name  number   share
0         MainScreenAppear    1773  0.0149
1       OffersScreenAppear     475  0.0101
2         CartScreenAppear     365  0.0086
3  PaymentScreenSuccessful     200  0.0059
4                 Tutorial      13  0.0128

We lose in total 1.47 % of events while expluding the data before Aug 1st.
In term of event type:
We lose 1.5% of main screen events, 1% of the offer screen, 0.8% of cart screen, 0.6% of Payment Screen and 1.2% of tutorial events.

Now, We'll check how much events we lost in terms of groups (246,247,248):

In [24]:
g_befor = df.experiment_id.value_counts()
g_after = n_df.experiment_id.value_counts()
print('Number of events before filtering-')
display(g_befor)
print('Number of events after filtering-')
display(g_after)
print('The share by group that remain after the filtering is:\n\n{}'.format(g_after/g_befor))
Number of events before filtering-
248    85582
246    80181
247    77950
Name: experiment_id, dtype: int64
Number of events after filtering-
248    84563
246    79302
247    77022
Name: experiment_id, dtype: int64
The share by group that remain after the filtering is:

248    0.988093
246    0.989037
247    0.988095
Name: experiment_id, dtype: float64

We do not appear to have lost much of the events in terms of the experimental groups (less than 2% per group).

Proportion--

For calculating proportions we need to know the number of users in the group.

Let's check number of users in every group before and after filtering and the ratio:

In [25]:
before_g = df.groupby('experiment_id')['user_id'].nunique().reset_index()
after_g = n_df.groupby('experiment_id')['user_id'].nunique().reset_index()
groups = before_g.merge(after_g , how= 'left', on='experiment_id')
groups.columns = ['experiment_id','before','after']
groups['ratio'] = groups.after/groups.before
groups
Out[25]:
experiment_id before after ratio
0 246 2489 2484 0.997991
1 247 2520 2513 0.997222
2 248 2542 2537 0.998033

We do not appear to have lost much of the users in terms of the experimental groups (less than 2% per group).

Go Up.

Step 3. conclusion -

Logs-

  • There are 5 unique events in the logs , and 243,713 events overall.
  • The average number of events per user is: 32.28.
  • The earliest date that appears in Logs is: 2019-07-25, and the latest date is: 2019-08-07.
  • The period of time does the data cover is 13 days.
  • The data actually represent the dates 2019-08-01 to 2019-08-07.

Users-

  • We lost 17 unique users, there share from all unique users is: 0.2251%.

Events-

  • We lost 2610 unique events, there share from total events is: 1.4775%
  • We lose in total 1.47 % of events while expluding the data before Aug 1st.
  • We do not appear to have lost much of the events in terms of the experimental groups (less than 2% per group).

After testing and filtering the data, there is no noticeable problem relying on the remaining data, they are the vast majority of the logs.

Step 4. Study the event funnel

Go Up.

At this step we will study in depth the behavior of users in terms of conversion rates between events.
We will look at a common sequence of events, the number of users who do this sequence also from the perspective of the experimental groups.
In addition, we will examine the event after which the highest loss of users occurs.

What events are in the logs and their frequency

In [26]:
n_df.event_name.value_counts().reset_index()
Out[26]:
index event_name
0 MainScreenAppear 117328
1 OffersScreenAppear 46333
2 CartScreenAppear 42303
3 PaymentScreenSuccessful 33918
4 Tutorial 1005

As expected, the event that repeats the most is "MainScreenAppear" and it constitutes almost 50% of all events

Go Up.

Number of users who performed each of these actions

In [27]:
event_users =n_df.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name',ascending=False)
event_users
Out[27]:
event_name user_id
4 Tutorial 840
3 PaymentScreenSuccessful 3539
2 OffersScreenAppear 4593
1 MainScreenAppear 7419
0 CartScreenAppear 3734
  • It can be seen an encouraging statistic that out of 7419 unique users, almost 50% have reached the payment stage.
  • It seems that the Tutorial is not so popular, do our users already know how to do everything? Or maybe at worst, the explanation is not good and maybe annoying?

Go Up.

Proportion of users who performed the action at least once

In [28]:
once = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index().query('timestamp==1')
once = once.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name', ascending=False) 
once
Out[28]:
event_name user_id
4 Tutorial 756
3 PaymentScreenSuccessful 574
2 OffersScreenAppear 681
1 MainScreenAppear 246
0 CartScreenAppear 472
In [29]:
several = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index().query('timestamp >1')
several = several.groupby('event_name')['user_id'].nunique().reset_index().sort_values(by='event_name', ascending=False) 
several
Out[29]:
event_name user_id
4 Tutorial 84
3 PaymentScreenSuccessful 2965
2 OffersScreenAppear 3912
1 MainScreenAppear 7173
0 CartScreenAppear 3262
In [30]:
merged = event_users.merge(once, how = 'left', on = 'event_name')
merged.columns = ['event_name','count_all','count_once']
merged['prop_%'] = merged['count_once']/merged['count_all']*100
merged['twice_and_more_%'] = 100-merged['prop_%']
print('This is the proportion of user that performd the action at least once from total:\n\n{}'.format(merged[['event_name','prop_%','twice_and_more_%']]))
This is the proportion of user that performd the action at least once from total:

                event_name     prop_%  twice_and_more_%
0                 Tutorial  90.000000         10.000000
1  PaymentScreenSuccessful  16.219271         83.780729
2       OffersScreenAppear  14.826911         85.173089
3         MainScreenAppear   3.315811         96.684189
4         CartScreenAppear  12.640600         87.359400
  • 83% of users made more than one purchase, which is a great figure!
  • We only lose 2% of users between event- "CartScreenAppear" to event "PaymentScreenSuccessful".

How much users made the all steps in their way to the paymabt screen?

In [31]:
actions = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
actions =actions.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
actions.dropna(inplace = True)
print('There is ',actions.shape[0],'who made the all steps.\nThey make up {:.0%} of all users.'.format(actions.user_id.nunique(
)/n_df.user_id.nunique()))
There is  466 who made the all steps.
They make up 6% of all users.

Go Up.

Sequence of event types-

We will now review the sequences that users performed on the way to payment.
For convenience we will define:

T = Tutorial
A = MainScreenAppear stage
B = OffersScreenAppear
C = CartScreenAppear
D = PaymentScreenSuccessful

We'll create table by users and the steps they performed:

In [32]:
seq = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
seq =seq.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
seq = seq[['user_id','MainScreenAppear','OffersScreenAppear','CartScreenAppear','PaymentScreenSuccessful','Tutorial']]
seq.columns = ['user_id','A','B','C','D','T']
seq.sample(3)
Out[32]:
user_id A B C D T
2696 3406331362586475046 1.0 1.0 1.0 1.0 NaN
392 518781617060869985 NaN 1.0 1.0 1.0 NaN
102 133486548927612375 1.0 1.0 1.0 1.0 1.0

We will now see the possible sequences and step-by-step conversion:

  • A->B->C->D
  • A->C->D
  • A->B->D
  • A->D
In [33]:
seq_A =seq.query('A==1')['user_id'].nunique()
seq_A_B =seq.query('A==1 & B==1')['user_id'].nunique()
seq_A_B_C =seq.query('A==1 & B==1 & C==1')['user_id'].nunique()
seq_A_B_C_D =seq.query('A==1 & B==1 & C==1 & D==1')['user_id'].nunique()
###
seq_A_C =seq.query('A==1 & C==1')['user_id'].nunique()
seq_A_C_D =seq.query('A==1 & C==1 & D==1')['user_id'].nunique()
###
seq_A_B_D = seq.query('A==1 & B==1 & D==1')['user_id'].nunique()
###
seq_A_D =seq.query('A==1 & D==1')['user_id'].nunique()
In [34]:
fig = go.Figure()

fig.update_layout(title="Funnels of Different Sequences")


fig.add_trace(go.Funnelarea(
            scalegroup = "second", values = [seq_A, seq_A_B,seq_A_B_C,seq_A_B_C_D], textinfo = "value",
            title = {"position": "top center", "text": "Seq: A->B->C->D"},
            marker = {"colors": ["deepskyblue", "tan", "teal", "silver"]},
            domain = {"x": [0, 0.5], "y": [0.55, 1]}))

fig.add_trace(go.Funnelarea(
            scalegroup = "second", values = [seq_A, seq_A_B,seq_A_B_D], textinfo = "value",
            title = {"position": "top center", "text": "Seq: A->B->D"},
            marker = {"colors": ["deepskyblue", "tan", "silver"]},
            domain = {"x": [0.55, 1], "y": [0.55, 1]}))

fig.add_trace(go.Funnelarea(
            scalegroup = "first", values = [seq_A, seq_A_C,seq_A_C_D],textinfo = "value",
            title = {"position": "top center", "text": "Seq: A->C->D"},
            marker = {"colors": ["deepskyblue","teal", "silver"]},
            domain = {"x": [0, 0.5], "y": [0,0.5]}))

fig.add_trace(go.Funnelarea(
            scalegroup = "first", values = [seq_A, seq_A_D],textinfo = "value ", 
            title = {"position": "top center", "text": "Seq: A->D"},
            marker = {"colors": ["deepskyblue","silver"]},
            domain = {"x": [0.55, 1], "y": [0,0.5]}))

fig.update_layout(
             margin = {"l": 200, "r": 200}, shapes = [
             {"x0": 0, "x1": 0.5, "y0": 0, "y1": 0.5},
             {"x0": 0, "x1": 0.5, "y0": 0.55, "y1": 1},
             {"x0": 0.55, "x1": 1, "y0": 0, "y1": 0.5},
             {"x0": 0.55, "x1": 1, "y0": 0.55, "y1": 1}])

fig.update_layout(legend=dict(title = "Event:\n 0=A,1=B,2=C,3=D"))

fig.show()

As We can see the funnels above, not all of them part of a single sequence.
The user can reach to the payment event in a different sequences.

Go Up.

Event funnel: The share of users that proceed from each stage to the next (Total, by Experiment Group)

Total Users:

In [35]:
names = ["A - Main Screen", "B - Offers", "C - Cart", "D - Payment"]
evnn = event_users.sort_values(by='user_id',ascending=False)

fig=go.Figure(go.Funnel(
    y = names,
    x = evnn.user_id,
    textinfo = "value+percent previous",
    marker = {"color": "teal"}))
fig.update_layout(title="Event Funnel: Total Users")

fig.show()
  • From the funnel above we can see that just 62% of unique users continue to the offer screen.
  • Just 47.7% users make it untill the payment screen.

By Expirement Groups:

In [36]:
funnel_by_group = []

for i in n_df.experiment_id.unique():
    group = n_df[(n_df.experiment_id == i) & (n_df.event_name != 'Tutorial')].groupby(
            ['event_name','experiment_id'])['user_id'].nunique().reset_index().sort_values(by = 'user_id',ascending = False)
    funnel_by_group.append(group)
funnel_by_groups = pd.concat(funnel_by_group)
funnel_by_groups
Out[36]:
event_name experiment_id user_id
1 MainScreenAppear 246 2450
2 OffersScreenAppear 246 1542
0 CartScreenAppear 246 1266
3 PaymentScreenSuccessful 246 1200
1 MainScreenAppear 247 2476
2 OffersScreenAppear 247 1520
0 CartScreenAppear 247 1238
3 PaymentScreenSuccessful 247 1158
1 MainScreenAppear 248 2493
2 OffersScreenAppear 248 1531
0 CartScreenAppear 248 1230
3 PaymentScreenSuccessful 248 1181
In [37]:
fig = px.funnel(funnel_by_groups, x='user_id', y = 'event_name', color = 'experiment_id')
fig.update_traces(textinfo = "value+percent previous+percent initial")

fig.show()
  • From the funnel above we can see that the share of unique users each experiment group is pretty the same in all groups(61%-63%).
  • It seems that the converation rate from the main screen in group 246 is the highest from the other groups.
    Would we can conclude from this that the change that the developer wanted to do is not affect on the users? We'll answer that in the next step....

Go Up.

Check the stage we lose the most users-

In [38]:
funnel_shift = n_df.query('event_name!= "Tutorial"').groupby('event_name')['user_id'].nunique().sort_values(ascending =False).reset_index()
funnel_shift['perc_ch'] = funnel_shift['user_id'].pct_change()
funnel_shift
Out[38]:
event_name user_id perc_ch
0 MainScreenAppear 7419 NaN
1 OffersScreenAppear 4593 -0.380914
2 CartScreenAppear 3734 -0.187024
3 PaymentScreenSuccessful 3539 -0.052223
In [39]:
fig = px.bar(funnel_shift, x="event_name", y="perc_ch", color="event_name", title="% of lose users by event")
fig.show()

From the graph above, we can see that on reaching the second event from the first event we lose the most users (close to 40%)!
We will suggest developers check out the main landing page this is much more important than changing the font!

Go Up.

Share of users make the entire journey from their first event to payment-

In [40]:
all_ev = n_df.groupby(['user_id','event_name'])['timestamp'].count().reset_index()
all_eve = all_ev.pivot_table(index='user_id', columns='event_name',values='timestamp', aggfunc='count').reset_index()
journy_all_events = all_eve.dropna().count()[0]
journy = n_df.query('event_name == "PaymentScreenSuccessful"').user_id.nunique()
journy


print('The number of users that made the entire journey is {} and their share from their first event is: {:.2%}.'.format(journy,journy/n_df.user_id.nunique()))
The number of users that made the entire journey is 3539 and their share from their first event is: 46.97%.

Go Up.

Step 4. Conclusions

  • The event that repeats the most is "MainScreenAppear" and it constitutes almost 50% of all events.
  • We can see that trough MainScreenAppear stage went 7,419 unique users, OffersScreenAppear - 4,593, CartScreenAppear - 3,734, PaymentScreenSuccessful - 3,539, Tutorial - 840.
  • The Tutorial is not so popular we to understand why?
  • 83% of users made more than one purchase, which is a great figure!
  • We only lose 2% of users between event- "CartScreenAppear" to event "PaymentScreenSuccessful".
  • We lose the most users from main screen to offer screen (close to 40%)!
  • We will suggest developers check out the main landing page this is much more important than changing the font!
  • The number of users that made the entire journey is 466 and their share in the "Payment screen" is 13.17%.

If so, at this point I would suggest to the dev department to offer an alternative main screen, on which we will perform the next A / B test.

Go Up.

Step 5. Study the results of the experiment

At this stage we will perform an A / A test on the proportions between each event between the control groups, and if it turns out well we will perform an A / B test for each of the control groups together with the test group.
We can then conclude whether the proposed change yields positive results or is ineffective.

Number of users in each experiment group

In [41]:
A_A_B = n_df.groupby(['experiment_id'])['user_id'].nunique().reset_index()
A_A_B 
Out[41]:
experiment_id user_id
0 246 2484
1 247 2513
2 248 2537

A/A Test to control groups 246 & 276 looking for statistically significant difference.

Before we perform the test, we must make sure that there are no users in more than one group-

In [42]:
n_df.groupby(['user_id'])['experiment_id'].nunique().reset_index().query('experiment_id>1')
Out[42]:
user_id experiment_id

We found that there are no duplications, we can continue!

We will now create a table that contains all the data on the number of users divided experiment group:

In [43]:
pivot = n_df.pivot_table(index='event_name', values='user_id', columns='experiment_id', aggfunc=lambda x: x.nunique()).reset_index()
pivot
Out[43]:
experiment_id event_name 246 247 248
0 CartScreenAppear 1266 1238 1230
1 MainScreenAppear 2450 2476 2493
2 OffersScreenAppear 1542 1520 1531
3 PaymentScreenSuccessful 1200 1158 1181
4 Tutorial 278 283 279

It seems that the most popular event is the main screen event with the highest number of users each group.

We will now prepare the function that will perform the tests between 2 selected groups for every event:

In [44]:
def check_hypothesis(group1,group2, event, alpha=0.05):
    print('Event name:',event)
    print('--------------------------')
    print('Hypothesis definision:')
    print('--------------------------')
    print('* The significant level (alpha) is: {:.1%}'.format(alpha))
    print('* H0: The number of success in the given event: "{}" of the group B- {} equals to the group A- {}'.format(event,group1,group2))
    print('* H1: The number of success in the given event: "{}" of the group B- {} is not equals to the group A- {}'.format(event,group1,group2))
    print('==========================')
    print('*      Test Result:      *')
    print('==========================')

    #let's start with successes, using 
    successes1=pivot[pivot.event_name==event][group1].iloc[0]
    successes2=pivot[pivot.event_name==event][group2].iloc[0]
    
    #for trials we can go back to original df or used a pre-aggregated data
    trials1=n_df[n_df.experiment_id==group1]['user_id'].nunique()
    trials2=n_df[n_df.experiment_id==group2]['user_id'].nunique()
    
    #proportion for success in the first group
    p1 = successes1/trials1

   #proportion for success in the second group
    p2 = successes2/trials2

    # proportion in a combined dataset
    p_combined = (successes1 + successes2) / (trials1 + trials2)

  
    difference = p1 - p2
    
    
    z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))

  
    distr = stats.norm(0, 1) 


    p_value = (1 - distr.cdf(abs(z_value))) * 2

    print('p-value: ', p_value)

    if (p_value < alpha):
        print('Reject H0 for',event, 'and groups',group1,group2)
    else:
        print('Fail to Reject H0 for', event,'and groups',group1,group2)  
    print('**********************************************************************************************************\n')

Go Up.

We will now examine the events in order to check if there is statistically significant difference between the two control groups:
We'll take alpha to be 5%-

In [45]:
a = 1
for i in pivot.event_name.unique():
    print('Test number',a)
    check_hypothesis(246,247, i, alpha=0.05)
    a= a+1
Test number 1
Event name: CartScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "CartScreenAppear" of the group B- 246 equals to the group A- 247
* H1: The number of success in the given event: "CartScreenAppear" of the group B- 246 is not equals to the group A- 247
==========================
*      Test Result:      *
==========================
p-value:  0.22883372237997213
Fail to Reject H0 for CartScreenAppear and groups 246 247
**********************************************************************************************************

Test number 2
Event name: MainScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "MainScreenAppear" of the group B- 246 equals to the group A- 247
* H1: The number of success in the given event: "MainScreenAppear" of the group B- 246 is not equals to the group A- 247
==========================
*      Test Result:      *
==========================
p-value:  0.7570597232046099
Fail to Reject H0 for MainScreenAppear and groups 246 247
**********************************************************************************************************

Test number 3
Event name: OffersScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "OffersScreenAppear" of the group B- 246 equals to the group A- 247
* H1: The number of success in the given event: "OffersScreenAppear" of the group B- 246 is not equals to the group A- 247
==========================
*      Test Result:      *
==========================
p-value:  0.2480954578522181
Fail to Reject H0 for OffersScreenAppear and groups 246 247
**********************************************************************************************************

Test number 4
Event name: PaymentScreenSuccessful
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 equals to the group A- 247
* H1: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 is not equals to the group A- 247
==========================
*      Test Result:      *
==========================
p-value:  0.11456679313141849
Fail to Reject H0 for PaymentScreenSuccessful and groups 246 247
**********************************************************************************************************

Test number 5
Event name: Tutorial
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "Tutorial" of the group B- 246 equals to the group A- 247
* H1: The number of success in the given event: "Tutorial" of the group B- 246 is not equals to the group A- 247
==========================
*      Test Result:      *
==========================
p-value:  0.9376996189257114
Fail to Reject H0 for Tutorial and groups 246 247
**********************************************************************************************************

As per our findings above, our two control groups in the A/A test (samples 246 and 247) has no statistically significant difference between them.

Go Up.

A/B Test for each event in isolation and conclusions-

Now We will examine the events in order to check if there is statistically significant difference between the two control and test group:

Initially I will perform with a significance level of 5%.
After I will perform with a significance level of 10%.

In [46]:
print('\t\t\t\t\t\tA=246, B=248\n')
a = 1
for i in pivot.event_name.unique():
    print('Test number',a)
    check_hypothesis(246,248, i, alpha=0.05)
    a= a+1
						A=246, B=248

Test number 1
Event name: CartScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "CartScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "CartScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.07842923237520116
Fail to Reject H0 for CartScreenAppear and groups 246 248
**********************************************************************************************************

Test number 2
Event name: MainScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "MainScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "MainScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2949721933554552
Fail to Reject H0 for MainScreenAppear and groups 246 248
**********************************************************************************************************

Test number 3
Event name: OffersScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "OffersScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "OffersScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.20836205402738917
Fail to Reject H0 for OffersScreenAppear and groups 246 248
**********************************************************************************************************

Test number 4
Event name: PaymentScreenSuccessful
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2122553275697796
Fail to Reject H0 for PaymentScreenSuccessful and groups 246 248
**********************************************************************************************************

Test number 5
Event name: Tutorial
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "Tutorial" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "Tutorial" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.8264294010087645
Fail to Reject H0 for Tutorial and groups 246 248
**********************************************************************************************************

It seems that with alpha = 5%, there is no significant difference between the groups.

In [47]:
print('\t\t\t\t\t\tA=247, B=248\n')
a = 1
for i in pivot.event_name.unique():
    print('Test number',a)
    check_hypothesis(247,248, i, alpha=0.05)
    a= a+1
						A=247, B=248

Test number 1
Event name: CartScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "CartScreenAppear" of the group B- 247 equals to the group A- 248
* H1: The number of success in the given event: "CartScreenAppear" of the group B- 247 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.5786197879539783
Fail to Reject H0 for CartScreenAppear and groups 247 248
**********************************************************************************************************

Test number 2
Event name: MainScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "MainScreenAppear" of the group B- 247 equals to the group A- 248
* H1: The number of success in the given event: "MainScreenAppear" of the group B- 247 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.4587053616621515
Fail to Reject H0 for MainScreenAppear and groups 247 248
**********************************************************************************************************

Test number 3
Event name: OffersScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "OffersScreenAppear" of the group B- 247 equals to the group A- 248
* H1: The number of success in the given event: "OffersScreenAppear" of the group B- 247 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.9197817830592261
Fail to Reject H0 for OffersScreenAppear and groups 247 248
**********************************************************************************************************

Test number 4
Event name: PaymentScreenSuccessful
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 247 equals to the group A- 248
* H1: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 247 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.7373415053803964
Fail to Reject H0 for PaymentScreenSuccessful and groups 247 248
**********************************************************************************************************

Test number 5
Event name: Tutorial
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 5.0%
* H0: The number of success in the given event: "Tutorial" of the group B- 247 equals to the group A- 248
* H1: The number of success in the given event: "Tutorial" of the group B- 247 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.765323922474501
Fail to Reject H0 for Tutorial and groups 247 248
**********************************************************************************************************

It seems that with alpha = 5%, there is no significant difference between the groups.

Go Up.

Now We'll take alpha to be 10%:

In [48]:
print('\t\t\t\t\t\tA=246, B=248\n')
a = 1
for i in pivot.event_name.unique():
    print('Test number',a)
    check_hypothesis(246,248, i, alpha=0.1)
    a= a+1
						A=246, B=248

Test number 1
Event name: CartScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 10.0%
* H0: The number of success in the given event: "CartScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "CartScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.07842923237520116
Reject H0 for CartScreenAppear and groups 246 248
**********************************************************************************************************

Test number 2
Event name: MainScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 10.0%
* H0: The number of success in the given event: "MainScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "MainScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2949721933554552
Fail to Reject H0 for MainScreenAppear and groups 246 248
**********************************************************************************************************

Test number 3
Event name: OffersScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 10.0%
* H0: The number of success in the given event: "OffersScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "OffersScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.20836205402738917
Fail to Reject H0 for OffersScreenAppear and groups 246 248
**********************************************************************************************************

Test number 4
Event name: PaymentScreenSuccessful
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 10.0%
* H0: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2122553275697796
Fail to Reject H0 for PaymentScreenSuccessful and groups 246 248
**********************************************************************************************************

Test number 5
Event name: Tutorial
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 10.0%
* H0: The number of success in the given event: "Tutorial" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "Tutorial" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.8264294010087645
Fail to Reject H0 for Tutorial and groups 246 248
**********************************************************************************************************

In [ ]:
 

We found that at a significance level of 10%, there is a significant difference in the "CartScreenAppear" event.
Is the rejection of the null hypothesis due to the fact that there really is a significant difference between the two groups in this event?
Is this difference due to the fact that maybe we had to correct the level of significance because we performed few tests?
We will answer this question in the next step-

Go Up.

Bonferrni correction for previous tests

Bonferrini: a = a/m | a= a, m=number of tests
a=0.1
m=4 (Without the tutorial event)

new a = 0.1/4=0.025

In [49]:
a = 1
for i in pivot.event_name.unique():
    print('Test number',a)
    check_hypothesis(246,248, i, alpha=0.10/4)
    a= a+1
Test number 1
Event name: CartScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 2.5%
* H0: The number of success in the given event: "CartScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "CartScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.07842923237520116
Fail to Reject H0 for CartScreenAppear and groups 246 248
**********************************************************************************************************

Test number 2
Event name: MainScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 2.5%
* H0: The number of success in the given event: "MainScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "MainScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2949721933554552
Fail to Reject H0 for MainScreenAppear and groups 246 248
**********************************************************************************************************

Test number 3
Event name: OffersScreenAppear
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 2.5%
* H0: The number of success in the given event: "OffersScreenAppear" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "OffersScreenAppear" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.20836205402738917
Fail to Reject H0 for OffersScreenAppear and groups 246 248
**********************************************************************************************************

Test number 4
Event name: PaymentScreenSuccessful
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 2.5%
* H0: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "PaymentScreenSuccessful" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.2122553275697796
Fail to Reject H0 for PaymentScreenSuccessful and groups 246 248
**********************************************************************************************************

Test number 5
Event name: Tutorial
--------------------------
Hypothesis definision:
--------------------------
* The significant level (alpha) is: 2.5%
* H0: The number of success in the given event: "Tutorial" of the group B- 246 equals to the group A- 248
* H1: The number of success in the given event: "Tutorial" of the group B- 246 is not equals to the group A- 248
==========================
*      Test Result:      *
==========================
p-value:  0.8264294010087645
Fail to Reject H0 for Tutorial and groups 246 248
**********************************************************************************************************

After a correction in Nafrini, we received that there is no significant difference between the groups, probably in a previous test without the correction we received a "False Positive" condition.

Go Up.

Step 5. Conclusion

At this point we wanted to find out if the change that the development department wants to make will be effective on the step-by-step conversion rate and in general, will affect the overall conversion rate.

The findings are-

  1. Control groups are proportionate to each other, which indicates the correct collection of information.
  2. The most popular event is the main screen event, which makes sense. In tests A \ B it was found that between groups 246 and 248 there is a significant difference in the proportions of a "CartScreenAppear" event when the level of significance was 10%.
  3. For the sake of control and accuracy, we wanted to check whether the difference was due to an type 1 error and therefore we made a correction to the level of significance (Benfrini).
  4. After correcting the level of significance, it appears that there is no significant difference between the groups.

To summarize the tests-
It seems that the development did not yield results constitute a significant change for the company,
so I would recommend not adopting the change.

Overall Conclusion

In the analysis process we performed the following:

Preparation for processing-

  • We searched for the relevant dates, which will tell us the true story of user behavior.
  • We deleted duplicate data that could have affected our final results.
  • We have translated Unix Time to a readable date.

Learning the data-

  • We have seen that there are 5 unique events.
  • The relevant period for the analysis is 7 days long.
  • After filtering, we lost 17 unique users and about 2% of the data.

Learning the funnel-

  • Most of the documented activity is associated with the main screen.
  • About 50% of the unique users have reached the payment stage.
  • More than 83% have made a purchase more than once.
  • Almost 97% have visited the site more than once.
  • Most users do not go through all the steps to get to the payment stage.
  • Users rarely use Tutorial, this is a statistic worth researching.
  • Lose the most users between the first stage and the second stage.

A \ B- Tests

  • Control groups are proportionate to each other.
  • In A\B test between groups 246 and 248 we found significant difference in the proportions of a "CartScreenAppear" event that after Bennefrini corraction we found out that this difference was actually a type 1 error.


After studying and analyzing, I will give the following recommendations:

  1. Do not make the proposed change.
    According to the tests we performed, it does not appear that the change results in a higher conversion rate at any stage.
  2. I would recommend finding out why there is an absolute minority of users who use Tutorial.
  3. I recommend doing research on the main screen to see why many users leave it crying, and reduce the percentage of departures between this event and the event that follows.

Go Up.